Transaction Management Support in the VMS Operating System Kernel

نویسندگان

  • William A. Laing
  • James E. Johnson
  • Robert V. Landau
چکیده

outside the traditional Distributed transaction transaction processing management support is an monitor environment. enhancement to the VMS Introduction operating system. This Businesses are becoming support provides services critically dependent in the VMS operating system on the availability and for atomic transactions integrity of data stored that may span multiple on computer systems. As resource managers, such these businesses expand as those for flat files, and merge, they acquire network databases, and ever greater amounts of relational databases. on-line data, often on These transactions may disparate computer systems also be distributed and often in disparate across multiple nodes in databases. The Digital a network, independent distributed transaction of the communications manager (DECdtm) services mechanisms used by either described in this paper the application programs address the problem of or the resource managers. integrating data from The Digital distributed multiple computer systems transaction manager and multiple databases (DECdtm) services implement while maintaining data an optimized variant of the integrity under transaction two-phase commit protocol control. to ensure transaction atomicity. Additionally, The DECdtm services are these services take a set of transaction advantage of the unique processing features VAXcluster capabilities embedded in the VMS to greatly reduce the operating system. potential for blocking that These services support occurs with the traditional distributed atomic two-phase commit protocol. transactions and implement These features, now part of an optimized variant of the VMS operating system, the well-known, two-phase are readily available to commit protocol. multiple resource managers and to many applications Digital Technical Journal Vol. 3 No. 1 Winter 1991 1 Transaction Management Support in the VMS Operating System Kernel Design Goals that supports transaction Our overall design goal processing, as well was to provide base as timesharing, office services on which higher automation, and technical layers of software could computing. be built. This software The design of DECdtm would support reliable and services also reflects robust applications, while several other Digital and maintaining data integrity. VMS design strategies: Many researchers report o Pervasive availability that an atomic transaction and reliability. As is a very powerful organizations become abstraction for building increasingly dependent robust applications that on their information consistently update data. systems, the need for [1,2] Supporting such all applications to be an abstraction makes it universally available possible both to respond and highly reliable to partial failures and to increases. Features maintain data consistency. that ensure application Moreover, a simplifying availability and data abstraction is crucial integrity, such as when one is faced with the journaling and twocomplexity of a distributed phase commit, must system. be available to all With increasingly reliable applications, and hardware and the influx not limited to those of more general-purpose, traditionally thought fault-tolerant systems, of as "transaction the focus on reliability processing." has shifted from hardware o Operating environment to software. [3] Recent consistency. Embedding discussions indicate that features in the the key requirements for operating system that building systems with are required by a broad a 100-year mean time range of utilities between failures may be (1) ensures consistency software-fault containment, in two areas: first, using processes, and (2) in the functionality software-fault masking, across all layered using process checkpointing software products, and, and transactions. [4] second, in the interface It was clear that we for developers. For could use transactions instance, if several as a pervasive technique distributed database to increase application products require availability and data the two-phase commit consistency. Further, we protocol, incorporating saw that this technique the protocol into the had merit in a generalunderlying system allows purpose operating system programmers to focus 2 Digital Technical Journal Vol. 3 No. 1 Winter 1991 Transaction Management Support in the VMS Operating System Kernel on providing "valueo Atomicity. Either all added" features for the operations of a their products instead transaction complete, or of re-creating a common the transaction has no routine or protocol. effect at all. o Flexibility and o Serializability. interoperability. Our All operations that vision includes making executed for the DECdtm interfaces transaction must appear available to any to execute serially, developer or customer, with respect to every allowing a broad range other transaction. of software products to o Durability. The effects take advantage of the of operations that VMS environment. Future executed on behalf of DECdtm services are the transaction are also being designed to resilient to failures. conform to de facto and international standards A transaction manager for transaction supports the transaction processing, thereby abstraction by providing ensuring that VMS the following services: applications can o Demarcation operations interoperate with to start, commit, and applications on other abort a transaction vendors' systems. o Execution operations for resource managers to Transaction ManagerSome declare themselves part Definitions of a transaction and To grasp the concept of for transaction branch transaction manager, some managers to declare basic terms must first be the distribution of a understood: transaction o Resource manager. A o Two-phase commit software entity that operations for controls both the resource managers access and recovery of a and other transaction resource. For example, a managers to change the database manager serves transaction state (to as the resource manager either "preparing" or for a database. "committing") or to acknowledge receipt of a o Transaction. The request to change state execution of a set of operations with the properties of atomicity, serializability, and durability on recoverable resources. Digital Technical Journal Vol. 3 No. 1 Winter 1991 3 Transaction Management Support in the VMS Operating System Kernel Benefits of Embedding Two-phase Commit Protocol Transaction Semantics in the DECdtm services use an Kernel optimized variant of the Several benefits are technique referred to achieved by embedding as two-phase commit. The transaction semantics technique is a member of in the kernel of the the class of protocols VMS operating system. known as Atomic Commit Briefly, these benefits Protocols. This class include consistency, guarantees two outcomes: interoperability, and first, a single yes or flexibility. Embedding no decision is reached transaction semantics in among a distributed set the kernel makes a set of participants; and, of services available to second, this decision is different environments and consistently propagated products in a consistent to all participants, manner. As a consequence, regardless of subsequent interoperability between machine or communications products is encouraged, failures. This guarantee as well as investment is used in transaction in the development of processing to help achieve "value-added" features. The the atomicity property of a inherent flexibility allows transaction. a programmer to choose a The basic two-phase commit transaction processing protocol is straightforward monitor, such as VAX ACMS, and well known. It has been and to access multiple the subject of considerable databases anywhere in the research and technical network. The programmer may literature for several also write an application years. [5, 6, 7, 8, 9] The that reads a VAX DBMS following section describes CODASYL database, updates in detail this general twoan Rdb/VMS relational phase commit protocol for database, and writes report those who wish to have more records to a sequential VAX information on the subject. RMS file-all in a single transaction. Because all database and transaction The Basic Two-phase Commit processing products Protocol use DECdtm services, a The two-phase commit failure at any point in protocol occurs between the transaction causes two types of participants: all updates to be backed one coordinator and one out and the files to be or more subordinates. The restored to their original coordinator must arrive state. at a yes or no decision (typically called the "commit decision") and propagate that decision to all subordinates, regardless of any ensuing 4 Digital Technical Journal Vol. 3 No. 1 Winter 1991 Transaction Management Support in the VMS Operating System Kernel failures. Conversely, the operations can be either subordinates must maintain completed or backed out. certain guarantees (as The second phase, called described below) and must the commit phase, begins defer to the coordinator after the coordinator for the result of the receives all expected commit decision. As the votes. Based on the name suggests, two-phase subordinate votes, the commit occurs in two coordinator decides to distinct phases, which the commit if there are no coordinator drives. "veto" votes; otherwise, In the first phase, it decides to abort. The called the prepare phase, coordinator propagates the coordinator issues the decision to all "requests to prepare" subordinates as either to all subordinates. The an "order to commit" or an subordinates then vote, "order to abort." Because either a "yes vote" or a the coordinator's decision "veto." Implicit in a "yes must survive failures, a vote" is the guarantee record of the decision that the subordinate will is usually stored on disk neither commit nor abort before the orders are sent the transaction (decide to the subordinates. When yes or no) without an the subordinates complete explicit order from the processing, they send an coordinator. This guarantee acknowledgment back to must be maintained despite the coordinator that they any subsequent failures are "done." This allows and usually requires the coordinator to reclaim the subordinate to disk storage from completed place sufficient data on transactions. Figure 1 disk (prior to the "yes shows a time line of the vote") to ensure that the two-phase commit sequence. set of subordinates. A subordinate node may Intermediate nodes must also function as a superior propagate the messages (intermediate) node to down the tree and collect follow-on subordinates. responses back up the In such cases, there tree. Figure 2 shows a is a tree-structured time line for a two-phase relationship between the commit sequence with an coordinator and the full intermediate node. Digital Technical Journal Vol. 3 No. 1 Winter 1991 5 Transaction Management Support in the VMS Operating System Kernel Most of us have had direct The basic two-phase contact with the twocommit protocol is phase commit protocol. It straightforward, survives occurs in many activities. failures, and produces a Consider the typical single, consistent yes or wedding ceremony as no decision. However, this presented below, which protocol is rarely used is actually a very precise in commercial products. two-phase commit. Optimizations are often Official: Will you, Mary, applied to minimize take John...? message exchanges and physical disk writes. Bride: I will. These optimizations are important particularly to Official: Will you, John, the transaction processing take Mary...? market because the market is very performance Groomm: I will. sensitive, and two-phase commit occurs after the Official: I now pronounce application is complete. you man and Thus, two-phase commit is wife. reasonably considered an added overhead cost. We have endeavored to reduce The above dialog can be the cost in a number of viewed as a two-phase ways, resulting in low commit: overhead and a scalable protocol embodie`d in Coordinator:Request to the DECdtm services. Some Prepare? of the optimizations are described later in another Participant Yes Vote. section. 1: Coordinator:Request to Components of the DECdtm Prepare? Services The DECdtm services were Participant Yes Vote. developed as three separate 2: components: a transaction manager, a log manager, Coordinator:Commit and a communication Decision. manager. Together, these components provide Order to support for distributed Commit. transaction management. The transaction manager is the central component. The log manager services enable the transaction manager to store data on nonvolatile storage. The communication manager 6 Digital Technical Journal Vol. 3 No. 1 Winter 1991 Transaction Management Support in the VMS Operating System Kernel provides a locationindependent interprocess communication service used by the transaction and log managers. Figure 3 shows the relationships among these components. Digital Technical Journal Vol. 3 No. 1 Winter 1991 7 Transaction Management Support in the VMS Operating System Kernel The Digital Distributed program to the transaction, Transaction Manager to demarcate the work done As the central component in that application as of the DECdtm services, part of the transaction, the transaction manager and finally to return is responsible for the information about the application interface to transaction outcome. the DECdtm services. This The resource manager section presents the system services are routines services the transaction that provide the interface manager comprises. between the DECdtm services The transaction coordinator and the cooperating is the core of the resource managers. transaction manager. It This interface allows implements the transaction resource managers to state machine and knows declare themselves to the which resource managers and transaction manager and to subordinate transaction register their involvement managers are involved in the "voting" stage in a transaction. The of the two-phase commit coordinator also controls process of a specific what is written to transaction. nonvolatile storage and Finally, the information manages the volatile list services routines are of active transactions. the interface that allows The user services are resource managers to query routines that implement and update transaction the START_TRANSACTION, END_ information stored by TRANSACTION, and ABORT_ DECdtm services. This TRANSACTION transaction information is stored in system services. They either the volatile-active validate user parameters, transaction list or the dispense a transaction nonvolatile transaction identifier, pass state log. Resource managers may transition requests to the resolve and possibly modify transaction coordinator, the state of "in-doubt" and return information transactions through these about the transaction services. outcome. The Log Manager The branch management The log manager provides services support the the transaction manager creation and demarcation with an interface for of branches in the storing sufficient distributed transaction information in nonvolatile tree. New branches storage to ensure that the are constructed when outcome of a transaction subordinate application can be consistently programs are invoked in a resolved. This interface distributed environment. is available to operating The services are called on system components. The log to attach an application manager also supports the 8 Digital Technical Journal Vol. 3 No. 1 Winter 1991 Transaction Management Support in the VMS Operating System Kernel creation, deletion, and Transaction Processing Model general management of the Digital's transaction transaction logs used by processing model entails the transaction manager. An the cooperation of several additional utility enables distinct elements for operators to examine correct execution of a transaction logs and, in distributed transaction. extreme cases, makes it These elements are (1) the possible to change the application programmer, state of any transaction. (2) the resource managers, The Communication Manager (3) the integration of The communication manager the DECdtm services into provides a command/response the VMS operating system, message-passing facility (4) transaction trees, and to the transaction manager (5) vote-gathering and the and the log manager. The final outcome. interface is specifically Application Programmer designed to offer highThe application programmer performance, low-latency must bracket a series of services to operating operations with START_ system components. TRANSACTION and END_ The command/response, TRANSACTION calls. This connection-oriented, bracketing demarcates message-passing system the unit of work that the allows clients to exchange system is to treat as a messages. The clients may single atomic unit. The reside on the same node, application programmer may within the same cluster, call the DECdtm services or within a homogeneous to create the branches of VMS wide area network. the distributed transaction The communication manager tree. also provides highly optimized local (that is, Resource Managers intranode) and intracluster Resource managers, such transports. In addition, as VAX RMS, VAX Rdb/VMS, this service component and VAX DBMS, that access multiplexes communication recoverable resources links across a single, during a transaction cached DECnet virtual inform the DECdtm services circuit to improve the of their involvement in performance of creating and the transaction. The destroying wide area links. resource managers can then participate in the voting phase and react appropriately to the decision on the final outcome of the transaction. Resource managers must also provide recovery mechanisms to restore resources they Digital Technical Journal Vol. 3 No. 1 Winter 1991 9 Transaction Management Support in the VMS Operating System Kernel manage to a transactionconsistent state in the event of a failure. 10 Digital Technical Journal Vol. 3 No. 1 Winter 1991 Transaction Management Support in the VMS Operating System Kernel Integration in the The transaction identifier Operating System dispensed by the START_ The DECdtm services are TRANSACTION service is an a basic component of the input parameter to the VMS operating system. These branch services. This services are responsible parameter identifies two for maintaining the overall concerns for the local state of the distributed transaction manager object: transaction and for (1) to which transaction ensuring that sufficient tree the new branch should information is recorded be added, and (2) which on stable storage. Such transaction manager object information is essential is the immediate superior in the event of a failure in the tree. so that resource managers Resource managers join can obtain a consistent specific branches in a view of the outcome of transaction tree by calling transactions. the resource manager Each VMS node in a network services of the local normally contains one transaction manager object. transaction manager object. Vote-Gathering and the This object maintains a Final Outcome list of participants in When the "commit" phase transactions that are of the transaction is active on the node. This entered (triggered by list consists of resource an application call to managers local to the node END_TRANSACTION), each and the transaction manager transaction manager object objects located on other involved in the transaction nodes. must gather the "votes" Transaction Trees of the locally registered The node on which the resource managers and the transaction originated subordinate transaction (that is, the node on manager objects. The which the START_TRANSACTION results are forwarded service was called) may be to the coordinating viewed as the "root" of a transaction manager object. distributed transaction The coordinating tree. The transaction transaction manager manager object on this object eventually informs node is usually responsible the locally registered for coordinating the resource managers and the transaction commit phase subordinate transaction of the transaction. The manager objects of transaction tree grows as the final outcome of applications call on the the transaction. The branch management services subordinate transaction of the transaction manager manager objects, in turn, object. propagate this information to locally registered resource managers as well Digital Technical Journal Vol. 3 No. 1 Winter 1991 11 Transaction Management Support in the VMS Operating System Kernel as to any subordinate a "commit" record upon transaction manager receipt of an order to objects. commit. This latter record is written so that the Protocol Optimizations coordinator need not be asked about the commit The DECdtm services decision should the use several previously intermediate node fail. published optimizations and This refinement isolates extend those optimizations the intermediate node's with a number that are recovery from communication unique to VAXcluster failures between it and the systems. In this coordinator. section we present these Performance is enhanced general optimizations, a when the DECdtm services discussion of VAXcluster write the commit record considerations, and on an intermediate node two VAXcluster-specific in a "nonurgent" or "lazy" optimizations. manner. [10] The lazy write General Optimizations buffers the information and The following sections waits for an urgent request describe some previously to trigger the group commit published optimizations. timer to write the data to disk. Typically, this Presumed Abort. DECdtm operation avoids a disk services use the "presumed write at the intermediate abort" optimization. [8, node. The increase in the 9] This optimization states length of time before the that, if no information can commit record is written is be found for a transaction negligible. by the coordinator, the One-Phase Commit. A key transaction aborts. This consideration in the design removes the need to write of the DECdtm services an abort decision to was to incur minimal disk and to subsequently impact on the performance acknowledge the order of Digital's database to abort. In addition, products. We exploited two subordinates that do not attributes to achieve this modify any data during the goal. First, all current transaction (that is, they users are limited to nonare "read only"), avoid distributed transactions writing information to disk (those that involve only or participating in the a single subordinate). commit phase. Second, the two-phase Lazy Commit Log Write. The commit protocol requires DECdtm services can act that all subordinates as intermediate nodes in respond with a "yes vote" a distributed transaction. to commit the transaction. In this mode, they write This allows a highly a "prepare" record prior optimized path for single to responding with a "yes subordinate transactions. vote." They also write Such transactions require 12 Digital Technical Journal Vol. 3 No. 1 Winter 1991 Transaction Management Support in the VMS Operating System Kernel no writes to disk by communication is lost, a the DECdtm services and subordinate node knows, as execute in one phase. The a result of the guarantee subordinate is told that it against partitioning, that is the only voting party in its coordinator has failed. the transaction and, if it Because a subordinate is willing to respond with node can access the a "yes vote," it should transaction log of the proceed and perform its failed coordinator, it order to commit processing. may immediately "host" VAXcluster Considerations its failed coordinator's The optimizations listed recovery. Communications to above (and others not the hosted coordinator are described here) provide quickly restored, and the the DECdtm services with subordinate node is able a competitive two-phase to complete the transaction commit protocol. VAXcluster commit. technology, though, offers VAXcluster-specific other untapped potential. Optimizations VAXcluster systems offer Once the blocking potential several unique features, in was removed from intraparticular, the guarantee VAXcluster transactions, against partitioning, the several additional distributed lock manager, protocol optimizations and the ability to share became practical. The disk access between CPUs. optimizations described [11] in this section are Within a VAXcluster dynamically enabled if system, use of these the subordinate and its unique features allows coordinator are both in the the DECdtm services to same VAXcluster system. avoid a blocked condition Early Prepare Log Write. which occurs during the As noted earlier, an short period of time intermediate node must when a subordinate node write a "prepare" record responds with a "yes vote" prior to responding with a and communication with "yes vote." The presence its coordinator is lost. of this record in an Normally, the subordinate intermediate node's log is unable to proceed with indicates that the node that transaction's commit must get the outcome of until communications have the transaction from the been restored. coordinator and, thus, it Outside a VAXcluster is subject to blocking. system, the DECdtm Therefore, the prepare services would indeed record is typically written be blocked. If, however, after all the expected the subordinate and its votes are returned, coordinator are in the which adds to commit-time same VAXcluster system, latency. this will not occur. If Digital Technical Journal Vol. 3 No. 1 Winter 1991 13 Transaction Management Support in the VMS Operating System Kernel The DECdtm services are free from blocking concerns should the intermediate within a VAXcluster system; node fail. Note that this the vast majority of is not a concern for the transactions do commit. intra-VAXcluster case. This factor prompted an Therefore, no commit optimization that writes record is written at the a prepare record while intermediate node. simultaneously collecting the subordinate votes. This reduces commit-time Performance Evaluation latency. Table 1 describes the No Commit Log Write. message and log write costs The lazy commit log of the DECdtm services write optimization protocol and compares it described above causes to the basic two-phase the intermediate node's commit protocol, as well commit record to be written as to the standard presumed and, thus, minimizes the abort variant previously potential for blocking described. [8,9]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Push: An experimental facility for implementing distributed database services in operating systems

Distributed database systems need special operating system support. Support routines can be implemented inside the kernel or at the user level. Kernel-level functions, while eecient, are hard to implement. User-level implementations are easier, but suuer from poor performance and lack of security. This paper proposes a new approach to supplement or modify kernel facilities for database transact...

متن کامل

Coordinated and Secure Server Consolidation Using Virtual Machines

Server consolidation using virtual machines (VMs) can improve resource utilization by sharing physical resources. Each VM is isolated from the others for security and VMs can be easily migrated for load balancing. Since there are several VMs in a physical machine, the virtual machine monitor (VMM) multiplexes the physical resources among VMs according to system settings. The administrators dete...

متن کامل

A Model and Prototype of VMS Using the Mach 3.0 Kernel

Digital’s VMS operating system has been a successful software base for our VAX processors since the late 1970’s. Existing operating systems are facing many new requirements and challenges in the 1990’s and beyond. This has led us to investigate new approaches for designing, implementing, and maintaining VMS. One such effort is described in this paper. Using the Mach 3.0 kernel from Carnegie Mel...

متن کامل

The RelaX Architecture

RelaX (Reliable distributed applications support on UNIX) is a portable and extensible system software layer on top of UNIX-like operating system kernels supporting reliable distributed applications by a generalized transaction mechanism. The distributed transaction mechanism relieves each programmer of dealing explicitly with error recovery and concurrency control in every distributed applicat...

متن کامل

Designing an Optimized Transaction Committ Protocol

kernel called KODA. In Digital's database addition to other database products, VAX Rdb/VMS services, KODA provides the and VAX DBMS, share the transaction capabilities same database kernel and commit processing for called KODA. KODA uses these two products. a grouping mechanism to In this paper, we address commit many concurrent some of the issues relevant transactions together. to efficient co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Digital Technical Journal

دوره 3  شماره 

صفحات  -

تاریخ انتشار 1991